Algorithms for Data Science : Lecture 4 Barna Saha
نویسنده
چکیده
We have seen the Chernoff+Union bound in action in the previous section when we analyzed the outcome of reservoir sampling for items in [1, 100] over m iterations. There the bad event Badi represents the event that item i is not sampled in the range m 100 ± m 200 . Using the Chernoff bound, for each i Pr[Badi] is minuscule. Therefore, the probability that at least one of the bad event happens which will leave us unconvinced about the uniformity of reservoir sampling is at most 100 ∗minuscule = small, by taking union bound over the 100 bad events. Lets take another example, and follow the above argument rigorously.
منابع مشابه
Algorithms for Data Science: Lecture on Finding Similar Items
Finding similar items is a fundamental data mining task. We may want to find whether two documents are similar to detect plagiarism, mirror websites, multiple versions of the same article etc. Finding similar items is useful for building recommender systems as well where we want to find users with similar buying patterns. In Netflix two movies can be deemed similar if they are rated highly by t...
متن کاملAlgorithms for Data Science : Lecture 3
1 Concentration Inequalities Lemma 1 (Markov’s inequality). Let X be a non-negative random variable. For all λ > 0, Pr[X > λ] ≤ E[X] λ Lemma 2 (Chebyshev Inequality). For all λ > 0, Pr[|X −E[X]| > λ] ≤ var[X] λ2 Lemma 3 (The Chernoff Bound: Upper bound). Let X1, X2, ..., Xn be independent random variables taking values in {0, 1} with E[Xi] = pi. Let X = ∑n i=1Xi, and μ = E[X]. Then the followin...
متن کاملAlgorithms for Data Science: Lecture on Clustering
Given a set of points with a notion of distance between points, group the points into some number of clusters so that members of a cluster are “close” to each other, while members of different clusters are far. The problem of clustering is ubiquitous. We may want to cluster documents by topic they represent, we may want to cluster the moviegoers by the types of movies they like, or cluster gene...
متن کاملAlgorithms for Data Science: Lecture 5
The balls-and-bins exercise that we did in Homework 1 is also useful for modeling Hashing. A hash function h from a universe U = [0, 1, .., m− 1] into a range [0, ..., n− 1] can be thought of as a way of placing items from the universe into n bins. The collection of bins is called a hash table. We can model the distribution of items in bins with the same distribution as m balls placed randomly ...
متن کاملCMSC 858F: Algorithmic Game Theory Fall 2010 Frugality & Profit Maximization in Mechanism Design
Recall from the previous lecture that in combinatorial auction each bidder has an associated real-valued valuation function V defined for each subset of items S. An allocation of items S1, S2, . . . , Sn among the bidders with valuation function V1, V2, . . . , Vn respectively is socially efficient if the allocation maximizes the social welfare ∑ i Vi(Si). Combinatorial auction is a very genera...
متن کامل